1

Introduction

Recently, we have witnessed a trend in deep learning in which models are rapidly increasing

in complexity [84, 211, 220, 90, 205, 286]. However, the host hardware where the models

are deployed has yet to keep up performance-wise due to practical limitations such as

latency, battery life, and temperature. It results in a large, ever-increasing gap between

computational demands and resources. To address this issue, network quantization [48,

199, 115, 149], which maps single-precision floating point weights or activations to lower

bits integers for compression and acceleration, has attracted considerable research attention.

The binary neural network (BNN) is the simplest version of low-bit networks and has gained

much attention due to its highly compressed parameters and activation features [48]. The

artificial intelligence company Xnor.ai is the most famous one focusing on BNNs. The

company, founded in 2016, raised a lot of money to build tools that help AI algorithms run

on devices rather than remote data centers. Apple Inc. bought the company and planned to

apply BNN technology on its devices to keep user information more private and speed-up

processing.

This chapter reviews recent advances in BNNs technologies well suited for front-end,

edge-based computing. We introduce and summarize existing works by classifying them

based on gradient approximation, quantization, architecture, loss functions, optimization

method, and binary neural architecture search. We also introduce computer vision and

speech recognition applications and discuss future applications of BNNs.

Deep learning has become increasingly important because of its superior performance.

Still, it suffers from a large memory footprint and high computational cost, making it dif-

ficult to deploy on front-end devices. For example, in unmanned systems, UAVs serve as

computing terminals with limited memory and computing resources, making it difficult

to perform real-time data processing based on convolutional neural networks (CNNs). To

improve storage and computation efficiency, BNNs have shown promise for practical ap-

plications. BNNs are neural networks where the weights are binarized. 1-bit CNNs are a

highly compressed version of BNNs that binarize both the weights and the activations to

decrease the model size and computational cost. These highly compressed models make

them suitable for front-end computing. In addition to these two, other quantizing neural

networks, such as pruning and sparse neural networks, are widely used in edge computing.

This chapter reviews the main advances of BNNs and 1-bit CNNs. Although binarization

operations can make neural networks more efficient, they almost always cause a significant

performance drop. In the last five years, many methods have been introduced to improve

the performance of BNNs. To better review these methods, we describe six aspects: gradient

approximation, quantization, structural design, loss design, optimization, and binary neural

architecture search. Finally, we will also review the object detection, object tracking, and

audio analysis applications of BNNs.

DOI: 10.1201/9781003376132-1

1